31 research outputs found

    MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding

    Full text link
    Given an untrimmed video and natural language query, video sentence grounding aims to localize the target temporal moment in the video. Existing methods mainly tackle this task by matching and aligning semantics of the descriptive sentence and video segments on a single temporal resolution, while neglecting the temporal consistency of video content in different resolutions. In this work, we propose a novel multi-resolution temporal video sentence grounding network: MRTNet, which consists of a multi-modal feature encoder, a Multi-Resolution Temporal (MRT) module, and a predictor module. MRT module is an encoder-decoder network, and output features in the decoder part are in conjunction with Transformers to predict the final start and end timestamps. Particularly, our MRT module is hot-pluggable, which means it can be seamlessly incorporated into any anchor-free models. Besides, we utilize a hybrid loss to supervise cross-modal features in MRT module for more accurate grounding in three scales: frame-level, clip-level and sequence-level. Extensive experiments on three prevalent datasets have shown the effectiveness of MRTNet.Comment: work in progres

    Dual Preference Distribution Learning for Item Recommendation

    Full text link
    Recommender systems can automatically recommend users with items that they probably like. The goal of them is to model the user-item interaction by effectively representing the users and items. Existing methods have primarily learned the user's preferences and item's features with vectorized embeddings, and modeled the user's general preferences to items by the interaction of them. In fact, users have their specific preferences to item attributes and different preferences are usually related. Therefore, exploring the fine-grained preferences as well as modeling the relationships among user's different preferences could improve the recommendation performance. Toward this end, we propose a dual preference distribution learning framework (DUPLE), which aims to jointly learn a general preference distribution and a specific preference distribution for a given user, where the former corresponds to the user's general preference to items and the latter refers to the user's specific preference to item attributes. Notably, the mean vector of each Gaussian distribution can capture the user's preferences, and the covariance matrix can learn their relationship. Moreover, we can summarize a preferred attribute profile for each user, depicting his/her preferred item attributes. We then can provide the explanation for each recommended item by checking the overlap between its attributes and the user's preferred attribute profile. Extensive quantitative and qualitative experiments on six public datasets demonstrate the effectiveness and explainability of the DUPLE method.Comment: 23 pages, 7 figures. This manuscript has been accepted by ACM Transactions on Information System

    Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation

    Full text link
    Multi-modal recommendation systems, which integrate diverse types of information, have gained widespread attention in recent years. However, compared to traditional collaborative filtering-based multi-modal recommendation systems, research on multi-modal sequential recommendation is still in its nascent stages. Unlike traditional sequential recommendation models that solely rely on item identifier (ID) information and focus on network structure design, multi-modal recommendation models need to emphasize item representation learning and the fusion of heterogeneous data sources. This paper investigates the impact of item representation learning on downstream recommendation tasks and examines the disparities in information fusion at different stages. Empirical experiments are conducted to demonstrate the need to design a framework suitable for collaborative learning and fusion of diverse information. Based on this, we propose a new model-agnostic framework for multi-modal sequential recommendation tasks, called Online Distillation-enhanced Multi-modal Transformer (ODMT), to enhance feature interaction and mutual learning among multi-source input (ID, text, and image), while avoiding conflicts among different features during training, thereby improving recommendation accuracy. To be specific, we first introduce an ID-aware Multi-modal Transformer module in the item representation learning stage to facilitate information interaction among different features. Secondly, we employ an online distillation training strategy in the prediction optimization stage to make multi-source data learn from each other and improve prediction robustness. Experimental results on a video content recommendation dataset and three e-commerce recommendation datasets demonstrate the effectiveness of the proposed two modules, which is approximately 10% improvement in performance compared to baseline models.Comment: 11 pages, 7 figure

    Target-Guided Composed Image Retrieval

    Full text link
    Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can retrieve the target image for a multimodal query, including a reference image and its corresponding modification text. Although existing efforts have achieved compelling success, they overlook the conflict relationship modeling between the reference image and the modification text for improving the multimodal query composition and the adaptive matching degree modeling for promoting the ranking of the candidate images that could present different levels of matching degrees with the given query. To address these two limitations, in this work, we propose a Target-Guided Composed Image Retrieval network (TG-CIR). In particular, TG-CIR first extracts the unified global and local attribute features for the reference/target image and the modification text with the contrastive language-image pre-training model (CLIP) as the backbone, where an orthogonal regularization is introduced to promote the independence among the attribute features. Then TG-CIR designs a target-query relationship-guided multimodal query composition module, comprising a target-free student composition branch and a target-based teacher composition branch, where the target-query relationship is injected into the teacher branch for guiding the conflict relationship modeling of the student branch. Last, apart from the conventional batch-based classification loss, TG-CIR additionally introduces a batch-based target similarity-guided matching degree regularization to promote the metric learning process. Extensive experiments on three benchmark datasets demonstrate the superiority of our proposed method

    General Debiasing for Multimodal Sentiment Analysis

    Full text link
    Existing work on Multimodal Sentiment Analysis (MSA) utilizes multimodal information for prediction yet unavoidably suffers from fitting the spurious correlations between multimodal features and sentiment labels. For example, if most videos with a blue background have positive labels in a dataset, the model will rely on such correlations for prediction, while ``blue background'' is not a sentiment-related feature. To address this problem, we define a general debiasing MSA task, which aims to enhance the Out-Of-Distribution (OOD) generalization ability of MSA models by reducing their reliance on spurious correlations. To this end, we propose a general debiasing framework based on Inverse Probability Weighting (IPW), which adaptively assigns small weights to the samples with larger bias i.e., the severer spurious correlations). The key to this debiasing framework is to estimate the bias of each sample, which is achieved by two steps: 1) disentangling the robust features and biased features in each modality, and 2) utilizing the biased features to estimate the bias. Finally, we employ IPW to reduce the effects of large-biased samples, facilitating robust feature learning for sentiment prediction. To examine the model's generalization ability, we keep the original testing sets on two benchmarks and additionally construct multiple unimodal and multimodal OOD testing sets. The empirical results demonstrate the superior generalization ability of our proposed framework. We have released the code and data to facilitate the reproduction

    Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction

    Full text link
    Automatic bundle construction is a crucial prerequisite step in various bundle-aware online services. Previous approaches are mostly designed to model the bundling strategy of existing bundles. However, it is hard to acquire large-scale well-curated bundle dataset, especially for those platforms that have not offered bundle services before. Even for platforms with mature bundle services, there are still many items that are included in few or even zero bundles, which give rise to sparsity and cold-start challenges in the bundle construction models. To tackle these issues, we target at leveraging multimodal features, item-level user feedback signals, and the bundle composition information, to achieve a comprehensive formulation of bundle construction. Nevertheless, such formulation poses two new technical challenges: 1) how to learn effective representations by optimally unifying multiple features, and 2) how to address the problems of modality missing, noise, and sparsity problems induced by the incomplete query bundles. In this work, to address these technical challenges, we propose a Contrastive Learning-enhanced Hierarchical Encoder method (CLHE). Specifically, we use self-attention modules to combine the multimodal and multi-item features, and then leverage both item- and bundle-level contrastive learning to enhance the representation learning, thus to counter the modality missing, noise, and sparsity problems. Extensive experiments on four datasets in two application domains demonstrate that our method outperforms a list of SOTA methods. The code and dataset are available at https://github.com/Xiaohao-Liu/CLHE

    Federated Class-Incremental Learning with Prompting

    Full text link
    As Web technology continues to develop, it has become increasingly common to use data stored on different clients. At the same time, federated learning has received widespread attention due to its ability to protect data privacy when let models learn from data which is distributed across various clients. However, most existing works assume that the client's data are fixed. In real-world scenarios, such an assumption is most likely not true as data may be continuously generated and new classes may also appear. To this end, we focus on the practical and challenging federated class-incremental learning (FCIL) problem. For FCIL, the local and global models may suffer from catastrophic forgetting on old classes caused by the arrival of new classes and the data distributions of clients are non-independent and identically distributed (non-iid). In this paper, we propose a novel method called Federated Class-Incremental Learning with PrompTing (FCILPT). Given the privacy and limited memory, FCILPT does not use a rehearsal-based buffer to keep exemplars of old data. We choose to use prompts to ease the catastrophic forgetting of the old classes. Specifically, we encode the task-relevant and task-irrelevant knowledge into prompts, preserving the old and new knowledge of the local clients and solving the problem of catastrophic forgetting. We first sort the task information in the prompt pool in the local clients to align the task information on different clients before global aggregation. It ensures that the same task's knowledge are fully integrated, solving the problem of non-iid caused by the lack of classes among different clients in the same incremental task. Experiments on CIFAR-100, Mini-ImageNet, and Tiny-ImageNet demonstrate that FCILPT achieves significant accuracy improvements over the state-of-the-art methods

    HDAC8 Inhibition Specifically Targets Inv(16) Acute Myeloid Leukemic Stem Cells by Restoring p53 Acetylation

    Get PDF
    SummaryAcute myeloid leukemia (AML) is driven and sustained by leukemia stem cells (LSCs) with unlimited self-renewal capacity and resistance to chemotherapy. Mutation in the TP53 tumor suppressor is relatively rare in de novo AML; however, p53 can be regulated through post-translational mechanisms. Here, we show that p53 activity is inhibited in inv(16)+ AML LSCs via interactions with the CBFβ-SMMHC (CM) fusion protein and histone deacetylase 8 (HDAC8). HDAC8 aberrantly deacetylates p53 and promotes LSC transformation and maintenance. HDAC8 deficiency or inhibition using HDAC8-selective inhibitors (HDAC8i) effectively restores p53 acetylation and activity. Importantly, HDAC8 inhibition induces apoptosis in inv(16)+ AML CD34+ cells, while sparing the normal hematopoietic stem cells. Furthermore, in vivo HDAC8i administration profoundly diminishes AML propagation and abrogates leukemia-initiating capacity of both murine and patient-derived LSCs. This study elucidates an HDAC8-mediated p53-inactivating mechanism promoting LSC activity and highlights HDAC8 inhibition as a promising approach to selectively target inv(16)+ LSCs

    Exploring potential genes and mechanisms linking erectile dysfunction and depression

    Get PDF
    BackgroundThe clinical correlation between erectile dysfunction (ED) and depression has been revealed in cumulative studies. However, the evidence of shared mechanisms between them was insufficient. This study aimed to explore common transcriptomic alterations associated with ED and depression.Materials and methodsThe gene sets associated with ED and depression were collected from the Gene Expression Omnibus (GEO) database. Comparative analysis was conducted to obtain common genes. Using R software and other appropriate tools, we conducted a range of analyses, including function enrichment, interactive network creation, gene cluster analysis, and transcriptional and post-transcriptional signature profiling. Candidate hub crosslinks between ED and depression were selected after external validation and molecular experiments. Furthermore, subpopulation location and disease association of hub genes were explored.ResultsA total of 85 common genes were identified between ED and depression. These genes strongly correlate with cell adhesion, redox homeostasis, reactive oxygen species metabolic process, and neuronal cell body. An interactive network consisting of 80 proteins and 216 interactions was thereby developed. Analysis of the proteomic signature of common genes highlighted eight major shared genes: CLDN5, COL7A1, LDHA, MAP2K2, RETSAT, SEMA3A, TAGLN, and TBC1D1. These genes were involved in blood vessel morphogenesis and muscle cell activity. A subsequent transcription factor (TF)–miRNA network showed 47 TFs and 88 miRNAs relevant to shared genes. Finally, CLDN5 and TBC1D1 were well-validated and identified as the hub crosslinks between ED and depression. These genes had specific subpopulation locations in the corpus cavernosum and brain tissue, respectively.ConclusionOur study is the first to investigate common transcriptomic alterations and the shared biological roles of ED and depression. The findings of this study provide insights into the referential molecular mechanisms underlying the co-existence between depression and ED
    corecore